24 research outputs found

    Framing Apache Spark in life sciences

    Get PDF
    Advances in high-throughput and digital technologies have required the adoption of big data for handling complex tasks in life sciences. However, the drift to big data led researchers to face technical and infrastructural challenges for storing, sharing, and analysing them. In fact, this kind of tasks requires distributed computing systems and algorithms able to ensure efficient processing. Cutting edge distributed programming frameworks allow to implement flexible algorithms able to adapt the computation to the data over on-premise HPC clusters or cloud architectures. In this context, Apache Spark is a very powerful HPC engine for large-scale data processing on clusters. Also thanks to specialised libraries for working with structured and relational data, it allows to support machine learning, graph-based computation, and stream processing. This review article is aimed at helping life sciences researchers to ascertain the features of Apache Spark and to assess whether it can be successfully used in their research activities

    gene relevance based on multiple evidences in complex networks

    Get PDF
    Abstract Motivation Multi-omics approaches offer the opportunity to reconstruct a more complete picture of the molecular events associated with human diseases, but pose challenges in data analysis. Network-based methods for the analysis of multi-omics leverage the complex web of macromolecular interactions occurring within cells to extract significant patterns of molecular alterations. Existing network-based approaches typically address specific combinations of omics and are limited in terms of the number of layers that can be jointly analysed. In this study, we investigate the application of network diffusion to quantify gene relevance on the basis of multiple evidences (layers). Results We introduce a gene score (mND) that quantifies the relevance of a gene in a biological process taking into account the network proximity of the gene and its first neighbours to other altered genes. We show that mND has a better performance over existing methods in finding altered genes in network proximity in one or more layers. We also report good performances in recovering known cancer genes. The pipeline described in this article is broadly applicable, because it can handle different types of inputs: in addition to multi-omics datasets, datasets that are stratified in many classes (e.g., cell clusters emerging from single cell analyses) or a combination of the two scenarios. Availability and implementation The R package 'mND' is available at URL: https://www.itb.cnr.it/mnd. Supplementary information Supplementary data are available at Bioinformatics online

    Removing duplicate reads using graphics processing units

    Get PDF
    Background: During library construction polymerase chain reaction is used to enrich the DNA before sequencing. Typically, this process generates duplicate read sequences. Removal of these artifacts is mandatory, as they can affect the correct interpretation of data in several analyses. Ideally, duplicate reads should be characterized by identical nucleotide sequences. However, due to sequencing errors, duplicates may also be nearly-identical. Removing nearly-identical duplicates can result in a notable computational effort. To deal with this challenge, we recently proposed a GPU method aimed at removing identical and nearly-identical duplicates generated with an Illumina platform. The method implements an approach based on prefix-suffix comparison. Read sequences with identical prefix are considered potential duplicates. Then, their suffixes are compared to identify and remove those that are actually duplicated. Although the method can be efficiently used to remove duplicates, there are some limitations that need to be overcome. In particular, it cannot to detect potential duplicates in the event that prefixes are longer than 27 bases, and it does not provide support for paired-end read libraries. Moreover, large clusters of potential duplicates are split into smaller with the aim to guarantees a reasonable computing time. This heuristic may affect the accuracy of the analysis. Results: In this work we propose GPU-DupRemoval, a new implementation of our method able to (i) cluster reads without constraints on the maximum length of the prefixes, (ii) support both single- and paired-end read libraries, and (iii) analyze large clusters of potential duplicates. Conclusions: Due to the massive parallelization obtained by exploiting graphics cards, GPU-DupRemoval removes duplicate reads faster than other cutting-edge solutions, while outperforming most of them in terms of amount of duplicates reads

    A sex-informed approach to improve the personalised decision making process in myelodysplastic syndromes: a multicentre, observational cohort study

    Get PDF
    Background Sex is a major source of diversity among patients and a sex-informed approach is becoming a new paradigm in precision medicine. We aimed to describe sex diversity in myelodysplastic syndromes in terms of disease genotype, phenotype, and clinical outcome. Moreover, we sought to incorporate sex information into the clinical decision-making process as a fundamental component of patient individuality. Methods In this multicentre, observational cohort study, we retrospectively analysed 13 284 patients aged 18 years or older with a diagnosis of myelodysplastic syndrome according to 2016 WHO criteria included in the EuroMDS network (n=2025), International Working Group for Prognosis in MDS (IWG-PM; n=2387), the Spanish Group of Myelodysplastic Syndromes registry (GESMD; n=7687), or the Dusseldorf MDS registry (n=1185). Recruitment periods for these cohorts were between 1990 and 2016. The correlation between sex and genomic features was analysed in the EuroMDS cohort and validated in the IWG-PM cohort. The effect of sex on clinical outcome, with overall survival as the main endpoint, was analysed in the EuroMDS population and validated in the other three cohorts. Finally, novel prognostic models incorporating sex and genomic information were built and validated, and compared to the widely used revised International Prognostic Scoring System (IPSS-R). This study is registered with ClinicalTrials.gov, NCT04889729. Findings The study included 7792 (58middot7%) men and 5492 (41middot3%) women. 10 906 (82middot1%) patients were White, and race was not reported for 2378 (17middot9%) patients. Sex biases were observed at the single-gene level with mutations in seven genes enriched in men (ASXL1, SRSF2, and ZRSR2 p<0middot0001 in both cohorts; DDX41 not available in the EuroMDS cohort vs p=0middot0062 in the IWG-PM cohort; IDH2 p<0middot0001 in EuroMDS vs p=0middot042 in IWG-PM; TET2 p=0middot031 vs p=0middot035; U2AF1 p=0middot033 vs p<0middot0001) and mutations in two genes were enriched in women (DNMT3A p<0middot0001 in EuroMDS vs p=0middot011 in IWG-PM; TP53 p=0middot030 vs p=0middot037). Additionally, sex biases were observed in co-mutational pathways of founding genomic lesions (splicing-related genes, predominantly in men, p<0middot0001 in both the EuroMDS and IWG-PM cohorts), in DNA methylation (predominantly in men, p=0middot046 in EuroMDS vs p<0middot0001 in IWG-PM), and TP53 mutational pathways (predominantly in women, p=0middot0073 in EuroMDS vs p<0middot0001 in IWG-PM). In the retrospective EuroMDS cohort, men had worse median overall survival (81middot3 months, 95% CI 70middot4-95middot0 in men vs 123middot5 months, 104middot5-127middot5 in women; hazard ratio [HR] 1middot40, 95% CI 1middot26-1middot52; p<0middot0001). This result was confirmed in the prospective validation cohorts (median overall survival was 54middot7 months, 95% CI 52middot4-59middot1 in men vs 74middot4 months, 69middot3-81middot2 in women; HR 1middot30, 95% CI 1middot23-1middot35; p<0middot0001 in the GEMSD MDS registry; 40middot0 months, 95% CI 33middot4-43middot7 in men vs 54middot2 months, 38middot6-63middot8 in women; HR 1middot23, 95% CI 1middot08-1middot36; p<0middot0001 in the Dusseldorf MDS registry). We developed new personalised prognostic tools that included sex information (the sex-informed prognostic scoring system and the sex-informed genomic scoring system). Sex maintained independent prognostic power in all prognostic systems; the highest performance was observed in the model that included both sex and genomic information. A five-to-five mapping between the IPSS-R and new score categories resulted in the re-stratification of 871 (43middot0%) of 2025 patients from the EuroMDS cohort and 1003 (42middot0%) of 2387 patients from the IWG-PM cohort by using the sex-informed prognostic scoring system, and of 1134 (56middot0%) patients from the EuroMDS cohort and 1265 (53middot0%) patients from the IWG-PM cohort by using the sex-informed genomic scoring system. We created a web portal that enables outcome predictions based on a sex-informed personalised approach. Interpretation Our results suggest that a sex-informed approach can improve the personalised decision making process in patients with myelodysplastic syndromes and should be considered in the design of clinical trials including low-risk patients. Copyright (c) 2022 Published by Elsevier Ltd. All rights reserved

    Tenebre bianche. Immaginari coloniali fin de si\ue8cle

    No full text
    Come Joseph Conrad lascia intravedere nel suo capolavoro, \ue8 proprio alla fine del XIX secolo che il Cuore di tenebra africano insinua le sue inquietanti rifrazioni sulla cultura imperiale delle nazioni europee. Il progetto che gli immaginari coloniali di Francia, Belgio, Inghilterra e Portogallo contribuiscono a stratificare nella cultura europea non \ue8 altro che il tentativo di legittimare l\u2019impero con l\u2019idea. I quattro saggi raccolti in queste Tenebre Bianche provano a decostruire i potenti dispositivi mitografici, concentrandosi sulle rappresentazioni letterarie e fotografiche, che le nazioni d\u2019Europa piegano alla propria causa imperiale. (quarta di copertina

    Lipids around the Clock: Focus on Circadian Rhythms and Lipid Metabolism

    No full text
    Disorders of lipid and lipoprotein metabolism and transport are responsible for the development of a large spectrum of pathologies, ranging from cardiovascular diseases, to metabolic syndrome, even to tumour development. Recently, a deeper knowledge of the molecular mechanisms that control our biological clock and circadian rhythms has been achieved. From these studies it has clearly emerged how the molecular clock tightly regulates every aspect of our lives, including our metabolism. This review analyses the organisation and functioning of the circadian clock and its relevance in the regulation of physiological processes. We also describe metabolism and transport of lipids and lipoproteins as an essential aspect for our health, and we will focus on how the circadian clock and lipid metabolism are greatly interconnected. Finally, we discuss how a deeper knowledge of this relationship might be useful to improve the recent spread of metabolic diseases

    Network Diffusion-Based Prioritization of Autism Risk Genes Identifies Significantly Connected Gene Modules

    Get PDF
    Autism spectrum disorder (ASD) is marked by a strong genetic heterogeneity, which is underlined by the low overlap between ASD risk gene lists proposed in different studies. In this context, molecular networks can be used to analyze the results of several genome-wide studies in order to underline those network regions harboring genetic variations associated with ASD, the so-called “disease modules.” In this work, we used a recent network diffusion-based approach to jointly analyze multiple ASD risk gene lists. We defined genome-scale prioritizations of human genes in relation to ASD genes from multiple studies, found significantly connected gene modules associated with ASD and predicted genes functionally related to ASD risk genes. Most of them play a role in synapsis and neuronal development and function; many are related to syndromes that can be in comorbidity with ASD and the remaining are involved in epigenetics, cell cycle, cell adhesion and cancer

    M1 and M2 tumour-associated macrophages subsets in canine malignant mammary tumours: An immunohistochemical study

    No full text
    Among the innate and adaptative immune cells recruited to the tumour site, tumour associated macrophages (TAMs) are particularly abundant and by simplified classification can be classified into (M1) and (M2) TAMs. In the present study, we quantified by immunohistochemistry ionized calcium binding adaptor molecule 1 (Iba1)-positive total and CD204-positive M2-polarized TAMs in 60 canine malignant mammary tumours (CMMTs) to analyse the relationship between M1 or M2 response and the histopathologic features of examined CMMTs, the dogs’ body condition score (BCS) and the progression of the neoplastic disease. The mean number of total and CD204+ TAMS were significantly higher in solid and in grade III than in grades I and II carcinomas. Moreover, the mean number of CD204-positive TAMs was significantly higher in CMMTs with lymphatic invasion and necrosis rather than CMMTs without. The presence of higher number of CD204-positive M2-polarized TAMs was associated with a worst outcome of the neoplastic disease: bitches bearing CMMTs with a prevalent M2-polarized TAM response had a median cancer-specific survival time of 449 days, while in animals with a M1-polarized TAM response the median cancer-specific survival time was 1209 days. The results of our study confirm that in CMMTs the presence of a M2-polarized TAMs response might affect the tumour development and behaviour. Finally, it strongly suggests the potential of CD204 expression as a prognostic factor

    An infrastructure for precision medicine through analysis of big data

    No full text
    Abstract Background Nowadays, the increasing availability of omics data, due to both the advancements in the acquisition of molecular biology results and in systems biology simulation technologies, provides the bases for precision medicine. Success in precision medicine depends on the access to healthcare and biomedical data. To this end, the digitization of all clinical exams and medical records is becoming a standard in hospitals. The digitization is essential to collect, share, and aggregate large volumes of heterogeneous data to support the discovery of hidden patterns with the aim to define predictive models for biomedical purposes. Patients’ data sharing is a critical process. In fact, it raises ethical, social, legal, and technological issues that must be properly addressed. Results In this work, we present an infrastructure devised to deal with the integration of large volumes of heterogeneous biological data. The infrastructure was applied to the data collected between 2010–2016 in one of the major diagnostic analysis laboratories in Italy. Data from three different platforms were collected (i.e., laboratory exams, pathological anatomy exams, biopsy exams). The infrastructure has been designed to allow the extraction and aggregation of both unstructured and semi-structured data. Data are properly treated to ensure data security and privacy. Specialized algorithms have also been implemented to process the aggregated information with the aim to obtain a precise historical analysis of the clinical activities of one or more patients. Moreover, three Bayesian classifiers have been developed to analyze examinations reported as free text. Experimental results show that the classifiers exhibit a good accuracy when used to analyze sentences related to the sample location, diseases presence and status of the illnesses. Conclusions The infrastructure allows the integration of multiple and heterogeneous sources of anonymized data from the different clinical platforms. Both unstructured and semi-structured data are processed to obtain a precise historical analysis of the clinical activities of one or more patients. Data aggregation allows to perform a series of statistical assessments required to answer complex questions that can be used in a variety of fields, such as predictive and precision medicine. In particular, studying the clinical history of patients that have developed similar pathologies can help to predict or individuate markers able to allow an early diagnosis of possible illnesses
    corecore